QTM 385 - Experimental Methods

Lecture 08 - Blocking, Clustering (cont.), and Statistical Power

Danilo Freire

Emory University

Hi, there!
Nice to see you again! 😉

Group work 👥

Group 1

Pre-formed

Name Login ID SIS ID Email
Elizabeth Shin EJSHIN6 2520422 elizabeth.shin@emory.edu
Emily Choi ECHOI73 2492522 emily.choi@emory.edu
Esther Yang QYANG68 2487073 esther.yang2@emory.edu
Zhiyi Li (Yolanda Li) ZLIT23 2513881 zhiyi.li@emory.edu
Angela Xie JXIE82 2515217 angela.xie@emory.edu

Group 2

Originally Anushka Basu & Annie Cao, now topped up to 4

Name Login ID SIS ID Email
Anushka Basu ABASU9 2551669 anushka.basu@emory.edu
Annie Cao JCAO66 2599315 annie.cao@emory.edu
Courtney Fitzgerald CFITZG4 2484240 courtney.fitzgerald@emory.edu
Adam Pastor AMPASTO 2565464 adam.pastor@emory.edu

Group 3

Pre-formed

Name Login ID SIS ID Email
Maura Dianno MDIANNO 2481848 maura.dianno@emory.edu
Kush Bhatia KBHATI7 2492303 kush.bhatia@emory.edu
Shriya Iyer SAIYER4 2493146 shriya.iyer@emory.edu

Group 4

Pre-formed

Name Login ID SIS ID Email
Sylvia Xing JXING8 2549831 sylvia.xing@emory.edu
Lucy Liu CLIU452 2561533 lucy.liu@emory.edu
Jessie Hao JHAO23 2513298 jessie.hao@emory.edu
Zoe Liu SLIU547 2583239 zoe.liu@emory.edu

Group 5

Dhwani + Anita, Harris + Xinyi

Name Login ID SIS ID Email
Dhwani Venkatarangan DAVENKA 2554493 dhwani.venkatarangan@emory.edu
Anita Osuri AOSURI2 2557540 anita.osuri@emory.edu
Xinyi Wang XWAN878 2549813 xinyi.wang@emory.edu
Harris Wang MWAN467 2551003 harris.wang@emory.edu

Group 6

Randomly assigned

Name Login ID SIS ID Email
Shuyang Yu SYU1025 2610436 shuyang.yu@emory.edu
Phoebe Pan ZPAN66 2630423 ziwen.pan@emory.edu
Zihan Liang ZLIAN57 2609381 zihan.liang@emory.edu
Evelyn Shi CSHI59 2609525 evelyn.shi2@emory.edu

Group 7

Miracle + Ahshar, now topped up to 4

Name Login ID SIS ID Email
Davis Boor DBOOR 2556176 davis.boor@emory.edu
Xipu Wang XWAN884 2551008 xipu.wang@emory.edu
Miracle Ephraim MEPHRAI 2492732 miracle.ephraim@emory.edu
Ahshar Brown AOBROW2 2575182 ahshar.brown@emory.edu

Group 8

Daniel + Howie

Name Login ID SIS ID Email
Howie Brown HJBROW5 2585210 howie.brown@emory.edu
Maxwell Troilo MTROILO 2520874 max.troilo@emory.edu
Daniel Nickas DNICKAS 2549711 daniel.nickas@emory.edu

Brief recap 📚

Blocking recap

  • Blocking involves grouping experimental units based on certain characteristics to ensure comparability between treatment and control groups.
  • Blocks are formed based on variables expected to affect the outcome, and within each block, units are randomly assigned to treatment or control.
  • Blocking reduces variance and increases precision by ensuring balanced groups within each block.
  • Key benefits:
    • Ensures equal representation of important subgroups in treatment and control.
    • Reduces the risk of confounding variables affecting results.
    • Particularly useful for small sample sizes or when heterogeneity is expected.

Source: Bobbit (2020)

Clustering recap

  • Clustering involves assigning whole groups of units to treatment and control, often due to practical constraints.
  • Common in experiments where individual randomisation is impossible (e.g., classrooms, villages).
  • Clustering introduces intra-cluster correlation (ICC), which measures how similar individuals within the same cluster are.
  • Challenges:
    • Higher variance compared to individual randomisation.
    • Requires robust clustered standard errors to avoid underestimating uncertainty.
  • Blocking can be used within clusters to further reduce variance and improve precision.

Clustering (cont.)

Clustering (cont.)

  • Last class, we discussed the concept of clustering in experiments
  • Clustering is often used when random assignment is not feasible, such as in:
    • Education (e.g., classrooms)
    • Healthcare (e.g., hospitals)
    • Community interventions (e.g., neighbourhoods)
  • Clustering can be beneficial, but it also comes with challenges, such as the need for robust statistical methods to account for intra-cluster correlation
  • We also discussed the importance of considering the design effect when planning a clustered experiment
    • We unfortunately lose statistical power when we cluster
  • Clustering is more of a necessity than a choice
  • The intra-cluster correlation (ICC) is a measure of the similarity of individuals within the same cluster, compared to individuals in different clusters
  • The ICC is defined as:
    • \(ICC = \frac{\sigma^2_{between}}{\sigma^2_{between} + \sigma^2_{within}}\)
  • The measure goes from 0 to 1. When it is closer to 0, it means that clusters have no influence on the outcome, so we can treat individuals as independent
    • This is the ideal situation!
  • When it is closer to 1, it means that all units within the same cluster have the same outcome
    • This is not good because it implies that units are so similar that the effective sample size is equal to the number of clusters
  • ICCs are always between these two, but the larger it is, the more we need to account for it

Clustering (cont.)

  • As we’ve seen, cluster randomised trials entail a series of specific challenges for standard estimation and testing methods
  • If randomisation is conducted at the cluster level, the uncertainty arising from this process is also at the cluster level
  • When we have a sufficient number of clusters, cluster robust standard errors can help us produce confidence intervals with the correct coverage. However, these require a large number of clusters
  • If the cluster size (or any related characteristic) is linked to the effect magnitude, then the estimation may be biased (and adjustments are required)
  • So, what can we do? 🤷🏻‍♂️

What to do in such situations?

  • One option is to increase the sample size to account for the loss of power due to clustering
  • This can be done by:
    • Adding more clusters
    • Increasing the number of units within each cluster
  • However, this can be challenging in practice, as it may not always be feasible to add more clusters or units
  • And this is where blocking comes in!
  • Blocking can be used to reduce variance within clusters, which can help to mitigate the loss of power due to clustering
  • Imai et al (2009) proposed a design suggestion to improve the efficiency of cluster randomised trials
  • The strategy has three steps:
    • First, choose the causal quantity of interest (usually, individual difference in means)
    • Then, identify available pre-treatment covariates likely to affect the outcome variable (blocks), and, if possible, pair clusters based on the similarity of these covariates and cluster sizes
    • They show that this step is usually overlooked and can yield many additional observations
    • Finally, researchers should randomly choose one treated and one control cluster within each pair